Generating An Entailment Corpus From News Headlines
نویسندگان
چکیده
We describe our efforts to generate a large (100,000 instance) corpus of textual entailment pairs from the lead paragraph and headline of news articles. We manually inspected a small set of news stories in order to locate the most productive source of entailments, then built an annotation interface for rapid manual evaluation of further exemplars. With this training data we built an SVM-based document classifier, which we used for corpus refinement purposes—we believe that roughly three-quarters of the resulting corpus are genuine entailment pairs. We also discuss the difficulties inherent in manual entailment judgment, and suggest ways to ameliorate some of these.
منابع مشابه
Paraphrasing Headlines by Machine Translation
In this paper we investigate the automatic collection, generation and evaluation of sentential paraphrases. Valuable sources of paraphrases are news article headlines; they tend to describe the same event in various different ways, and can easily be obtained from the web. We describe a method for generating paraphrases by using a large aligned monolingual corpus of news headlines acquired autom...
متن کاملContrastive Analysis of Political News Headlines Translation According to Berman’s Deformative Forces
The present research aimed at investigating the deformation of political news headlines translation between English and Persian News Agencies based on Berman`s deformative system. For this purpose, 100 news headlines in English were selected from BBC, Reuters, Associated Press, France, France 24, Financial Times, Business Times, New York Times, Politico, Guardian, CNN, Bloomberg, Middle East Ey...
متن کاملGenerating News Headlines with Recurrent Neural Networks
We describe an application of an encoder-decoder recurrent neural network with LSTM units and attention to generating headlines from the text of news articles. We find that the model is quite effective at concisely paraphrasing news articles. Furthermore, we study how the neural network decides which input words to pay attention to, and specifically we identify the function of the different neu...
متن کاملAutomatic Extraction of News Values from Headline Text
Headlines play a crucial role in attracting audiences’ attention to online artefacts (e.g. news articles, videos, blogs). The ability to carry out an automatic, largescale analysis of headlines is critical to facilitate the selection and prioritisation of a large volume of digital content. In journalism studies news content has been extensively studied using manually annotated news values – fac...
متن کاملTranslation of News Headlines
Machine-Translation of news headlines is difficult since the sentences are fragmentary and abbreviations and acronyms of proper names are frequently used. Another difficulty is that, since the headline comes at the top of a news article, the context information useful to disambiguate the sense of words and to determine their translation(target word) is not available. This paper proposes a new a...
متن کامل